From dictionary to corpus to self-organizing dictionary: learning valency associations in the face of variation and change
نویسنده
چکیده
ing over specific lexically-governed particles and prepositions and specific predicate selectional preferences, but including some `derived' / `alternant' semi-productive, and therefore only semipredictable, bounded dependency constructions, such as particle or dative movement, there are at least 163 valency frames associated with verbal predicates in (current) English (Briscoe, 2000). In this paper, I will review the work that my colleagues and I have done to learn (semi-)automatically this very large number of associations between individual verbal predicates and valency frames. Access to a comprehensive and accurate valency lexicon is critical for the development of robust and accurate parsing technology capable of recovering predicate-argument relations (and thus logical forms) from free text or transcribed speech. Without this information it is possible to `chunk’ input into phrases but not to distinguish arguments from adjuncts or resolve most phrasal attachment ambiguities. Furthermore, for statistical parsers it is not enough to know the associations of predicates to valency frames, it is also critical to know the relative frequency of such associations given a specific predicate. Such information is a core component of that required to `lexicalize’ a probabilistic parser, and it is now well–established that lexicalization is essential for accurate disambiguation (e.g. Collins, 1997, Carroll et al, 1998). While state-of-the-art wide-coverage grammars of English, capable of recovering predicateargument structure and expressed as a unification-based phrase structure grammar, have on the order of 1000 rules, it is clear that the number of associations between valency frames and predicates needed in a lexicon for such a grammar will be much higher.
منابع مشابه
A Novel Face Detection Method Based on Over-complete Incoherent Dictionary Learning
In this paper, face detection problem is considered using the concepts of compressive sensing technique. This technique includes dictionary learning procedure and sparse coding method to represent the structural content of input images. In the proposed method, dictionaries are learned in such a way that the trained models have the least degree of coherence to each other. The novelty of the prop...
متن کاملA New Dictionary Construction Method in Sparse Representation Techniques for Target Detection in Hyperspectral Imagery
Hyperspectral data in Remote Sensing which have been gathered with efficient spectral resolution (about 10 nanometer) contain a plethora of spectral bands (roughly 200 bands). Since precious information about the spectral features of target materials can be extracted from these data, they have been used exclusively in hyperspectral target detection. One of the problem associated with the detect...
متن کاملSpeech Enhancement using Adaptive Data-Based Dictionary Learning
In this paper, a speech enhancement method based on sparse representation of data frames has been presented. Speech enhancement is one of the most applicable areas in different signal processing fields. The objective of a speech enhancement system is improvement of either intelligibility or quality of the speech signals. This process is carried out using the speech signal processing techniques ...
متن کاملA Novel Image Denoising Method Based on Incoherent Dictionary Learning and Domain Adaptation Technique
In this paper, a new method for image denoising based on incoherent dictionary learning and domain transfer technique is proposed. The idea of using sparse representation concept is one of the most interesting areas for researchers. The goal of sparse coding is to approximately model the input data as a weighted linear combination of a small number of basis vectors. Two characteristics should b...
متن کاملDictionary of Abstract and Concrete Words of the Russian Language: A Methodology for Creation and Application
The paper describes the first stage of a project on creating an electronic dictionary with numerical estimates of the degree of abstractness and concreteness of Russian words. Our approach is to integrate data obtained from several different sources: text corpora, psycholinguistic experiments, published dictionaries, markers of abstractness (certain suffixes) and a translation of a similar dict...
متن کامل